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® Audio/video synchronization in a digital transmission system 



@ A receiver (FIGURE 2) for decoding associated 
compressed video and audio information compo- 
nents transmitted in mutually exclusive "frames" of 
data with respective time stamps PTS vid and PTS aud 
respectively, includes a controller (216) which is 
responsive to the respective received time stamps to 
provide coarse synchronization by decaying or skip- 
ping respective frames of one or the other of the 



components to approximately time align the two 
components. Fine synchronization is provided by 
adjusting the processing or clock frequency (215) of 
the audio signal processor (212) independent of the 
video processor(2l 4). The control for the frequency 
adjustment is related to the difference between audio 
and video time stamps. 
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(s£) Synchronizing system for time-divided video and audio signals. 
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(57) A synchronizing system with a simple struc- 
ture accomplishes synchronous reproduction 
without complicating a control circuit for 
synchronizing video and audio signals with 
each other. The number of unit audio data 
blocks to be put in one pack is set in such a way 
that the difference between the presentation 
start times for the stream of video data and the 
stream of audio data in one pack in a predeter- 
mined pack period becomes a predetermined 
value, and the pack carries positional infor- 
mation of the pack in the predetermined pack 
period to the pack. In a reproducing apparatus, 
the difference between presentation start times 
for video signals and audio signals in each pack 
is acquired by referring to positional infor- 
mation (AAU sequence number) in a stream of 
packs, transferred by the above transmission 
method, and at least one of the presentation 
start times for video signals and audio signals in 
the stream of packs is controlled so that the 
difference between the presentation start times 
coincides with the difference between the pres- 
entation start times corresponding to the 
positional information. 
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BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates to a synchronizing 
system which reproduces a video signal and an audio 
signal synchronously in a system that transfers coded 
video and audio signals in time-division multiplexing. 

Description of Background Information 

As a method of recording, reproducing or trans- 
ferring compressed and coded video and audio sig- 
nals and other data in time-division multiplexing, 
there is MPEG (Motion Picture coding Experts 
Group) which conforms to ISO 11172. 

The compressive coding of video signals in this 
scheme employs predictive coding in combination 
with motion compensation, and discrete cosine trans- 
form (DCT). 

The method described in the ISO 11172 requires 
that a counter having many bits should be provided in 
the reproducing apparatus, and that decoding timing 
should be so controlled as to start the presentation of 
decoded data as a video image or voices and sound 
when the value of the counter coincides with the pre- 
sentation time stamp (PTS). The control circuit there- 
fore becomes complicated. 

SUMMARY OF THE INVENTION 

It is therefore an object of the present invention 
to provide a synchronizing system with a simple 
structure, which can accomplish synchronous repro- 
duction without complicating a control circuit for syn- 
chronizing video and audio signals with each other. 

To achieve the above object, according to one as- 
pect of this invention, there is provided a method of 
transmitting time-divided video and audio signals, 
comprising the steps of coding a predetermined time 
slot of video signals to form a stream of video data; 
coding a predetermined number of samples of audio 
signals to form a unit audio data block and forming a 
stream of audio data consisting of unit audio data 
blocks whose quantity approximately corresponds to 
the predetermined time slot; performing time-division 
multiplexing on the stream of video data and the 
stream of audio data, storing resultant data in a pack 
having the predetermined time slot and transferring 
video signals and audio signals in a stream of packs; 
and setting the quantity in such a way that a differ- 
ence between presentation start times for the stream 
of video data and the stream of audio data in one pack 
in a predetermined pack period becomes a predeter- 
mined value, and affixing positional information of the 
pack in the predetermined pack period to the pack. 

According to another aspect of this invention, 
there is provided a method . of. reproducing time- 
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divided video and audio signals, comprising a step of 
referring to positional information from a stream of 
packs, transferred by the above transmission meth- 
od, to control at least one of presentation start times 

5 for video signals and audio signals in the stream of 
packs in such a manner that a difference between the 
presentation start times for the video signals and au- 
dio signals coincides with a difference between pre- 
sentation start times corresponding to the positional 

10 information. 

According to the method of transmitting time- 
divided video and audio signals, the number of unit 
audio data blocks to be put in one pack is set in such 
a way that the difference between the presentation 

15 start times for the stream of video data and the 
stream of audio data in one pack in a predetermined 
pack period becomes a predetermined value, and the 
pack carries positional information of the pack in the 
predetermined pack period to the pack. 

20 According to the method of reproducing time- 

divided video and audio signals, the difference be- 
tween presentation start times for video signals and 
audio signals in each pack is acquired by referring to 
. positional information in a stream of packs, transfer- 

25 red by the above transmission method, and at least 
one of the presentation start times for video signals 
- and audio signals in the stream of packs is controlled 
so that the difference between the presentation start 
times coincides with the difference between the pre- 

30 sentation start times corresponding to the positional 
information. 

' BRIEF' DESCRIPTION OF THE DRAWINGS 

35 ■ Fig. 1 is a diagram showing the directions of pre- 
diction between frames of video signals in the 
compressive coding which conforms to ISO 
11172; 

Fig. 2 is a diagram showing the transmission 
40 state of a video stream which conforms to ISO 

11172; 

Fig. 3 is a diagram exemplifying the multiplexing 
of various kinds of data that is specified by the 
system part of MPEG which conforms to ISO 
45 11172; 

Fig. 4 is a diagram for explaining various time 
stamps and reference time information, showing 
how a stream of multiplexed packs in Fig. 3 are 
reproduced; 

so Fig'. 5 is a diagram showing a data format in a 

method of recording compressed and eoded 
data; 

Fig. 6 is a schematic block diagram of an encoder 
which accomplishes a method of making the 
55 amount of data of GOP (Group Of Pictures) con- 

stant and which is employed in one embodiment 
of the present invention; 

Fig. 7 is a time chart for explaining the operation 
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of a buffer memory of the encoder shown in Fig. 
6; 

Fig. 8 is a diagram showing the format of a pack 
in a transmission method according to one em- 
bodiment of the present invention and the time- 
sequential relation between actual video infor- 
mation and audio information in that pack; 
Fig. 9 is a diagram showing the relation among an 
AAU (Audio Access Unit) sequence number, the 
number of AAUs in one pack, and the difference 
between presentation start times for video sig- 
nals and audio signals in a reproducing appara- 
tus; and 

Fig. 10 is a block diagram of a reproducing appa- 
ratus which accomplishes a reproducing method 
according another embodiment of the present in- 
vention. 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

Before discussing a preferred embodiment of the 
present invention, the conventional compressive cod- 
ing method will be described referring to the accom- 
panying drawings. 

An image coded by the MPEG scheme consists 
of an I picture (Intra coded picture) coded within a 
frame, a P picture (predictive coded picture) obtained 
by coding the difference between the current image 
and an old picture (decoded image of an I or P picture) 
and a B picture (Bidirectionally predictive coded pic- 
ture) obtained by coding the difference between the 
current image and an interpolated image which is pre- 
dicted bidirectionally from old and future images. The 
predictive directions are illustrated in Fig. 1. 

Referring to Fig. 1 , coded frame images are sym- 
bolized as parallelograms frame by frame. Those 
frame images correspond to consecutive frames of 
input video signals, and "l" f "P" and "B" affixed to the 
frame images indicate the aforementioned types of 
pictures of the frame images. The arrowheads indi- 
cate the directions of prediction between frames. 

A certain video sequence unit is collectively 
called "GOP" (Group Of Pictures). As one example. 
15 frames are treated as this unit in Fig. 1 and are se- 
quentially given frame numbers. 

The compression efficiency in this coding varies 
with the difference in the coding scheme of the indi- 
vidual picture types. The compression efficiency is 
the highest for B pictures, the next highest compres- 
sion efficiency for P pictures and the lowest compres- 
sion efficiency for 1 pictures. After compression, 
therefore, the I picture has the largest amount of data, 
the P picture has the next largest amount of data, and 
the B picture has the smallest amount of data. The 
amounts of each frame and each GOP are not be con- 
stant and differ depending on video information to be 
transmitted. 



While the order of uncompressed frames are as 
shown in Fig. 1, the order of compressed frames at 
the time of transmission becomes as shown in Fig. 2 
for the purpose of reducing the delay time in the de- 

5 coding process. 

Portions (a) and (b) in Fig. 2 conceptually illus- 
trate each coded frame image in view of the amount 
of data after compression, and the picture types I, P 
and B and frame numbers correspond to those shown 

10 in Fig. 1 . The coded video signals are arranged in the 
order of frame numbers as illustrated, and a se- 
quence header SQH can be affixed to ensure inde- 
pendent reproduction GOP by GOP as shown in a 
portion (c) in Fig. 2. The sequence header, which is 

15 located at least at the head of a stream of data or a 
video stream as shown in the portion (b) in Fig. 2, de- 
scribes information about the entire video stream. 
The sequence header may be affixed to the head of 
every GOP to ensure reproduction of data from a mid- 
20 die part of each GOP, and includes initial data needed 
for the decoding process, such as the size of an im- 
age and the ratio of the vertical pixels to the horizontal 
pixels. A video stream to be transferred to a decoder 
is formed in the above manner. 

25 The system part of the MPEG further specifies a 

scheme of multiplexing a compressed audio stream 
and a stream of other data in addition to the afore- 
mentioned compressed video stream and accom- 
plishing the synchronized reproduction of those 

30 streams. 

Fig. 3 exemplifies the multiplexing of various 
kinds of data, which is specified by the system part 
of the* MPEG. 

In Fig. 3, a portion (a) indicates a data stream of 

35 coded video signals consecutively arranged in the or- 
der of GOPs as indicated by the portion (c) in Fig. 2, 
i.e., a video stream, and a portion (b) indicates a data 
stream of audio signals that are compressed and cod- 
ed by a predetermined coding scheme which will not 

40 be discussed in detail. Partial data of each stream is 
stored in a packet together with a packet header lo- 
cated at the head of the packet. A packet in which vid- 
eo stream data is stored is called a video packet (VP), 
and a packet in which audio stream data is stored is 

45 called an audio packet (AP). Likewise, a packet in 
which a stream of data other than video and audio sig- 
nals, such as control data, is stored is called a data 
packet (DP) though not illustrated. 

Some of those packets are grouped as a pack 

so with a pack header placed at the head of this pack. 
The packets are transmitted pack by pack in the form 
shown in a portion (c) in Fig. 3. In the packet trans- 
mission, the pack header serves as a system header 
(SH) which describes information about the whole 

55 pack stream and includes a pack start code PS and 
a system dock reference SCR that indicates the ref- 
erence of time. The packet header includes a presen- 
tation time stamp PTS and a decoding time stamp 
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DTS as needed. A pack is the collection of individual 
partial streams each corresponding to a packet. 

SCR in the pack header is the number of system 
clocks of 90 KHz counted from some point of time, 
and is used as a reference of time in reproducing the 5 
associated pack. PTS in the packet header repre- 
sents the time at which the presentation of the packet 
containing that PTS as a video image or sound and 
voices starts, by the number of the system clocks 
counted. DTS represents the time at which decoding 10 
of the packet containing that DTS starts. For B pic- 
tures in a video packet and an audio packet, the time 
data of PTS equals that of DTS so that DTS need not 
particularly be described. For I and P pictures in a vid- 
eo packet, since the presentation starting time lags 15 
from the decoding starting time due to the rearrange- 
ment of the frames in the opposite order to the one 
shown in Fig. 2, PTS and DTS should be inserted as 
needed. PTS or a combination of PTS and DTS is in- 
serted in a stream of video and audio packets at an 20 
interval of 0.7 sec or below. 

In reproducing such a stream of packs, the value 
of SCR is loaded into a counter in a reproducing ap- 
paratus and thereafter the counter starts counting the 
system dock and is used as a clock. When PTS or 25 
DTS is present, each packet is decoded at the timing 
at which the presentation of the packet as a video im- 
age or sound and voices starts when the value of the 
counter coincides with PTS. With no PTS and DTS 
present, each packet is decoded following the decod- 30 
ing of the previous packet of the same kind. 

The above will be conceptually explained below. 
Suppose that SCR of a pack 1 has been input at time 
t11 based on the system clock indicated in a portion 
(b) in Fig. 4 which illustrates the reproduction state of 35 
the stream of packs that are denoted by the same 
shapes and reference numerals as used in Fig. 3. 
Time data T11 is described in the SCR. As video 
stream data whose presentation starts from time t12 
is stored in data DATA 11 in the first packet in the pack 40 
1 , time data t1 2 is described in PTS of that packet. As 
audio stream data whose presentation starts from 
time t13 is stored in data DATA 13 in the third packet 
in the pack 1, time data t13 is described in PTS of that 
packet As the end portion of GOP1 whose presenta- 45 
tion starts from time t15 and the head portion of sub- 
sequent GOP2 are stored in data DATA14 in the 
fourth packet in the pack 1 , time data t15 is described 
in PTS of that packet For the subsequent packets, 
SCR and PTS are described in the same manner. A so 
portion (c) in Fig. 4 shows presented video signals 
and a portion (d) presented audio signals. Although 
no PTS is described in the header of that packet which 
stores packet data DATA12, such a description is un- 
necessary as long as PTS is inserted at the afore- 55 
mentioned interval of 0.7 sec or below. Assuming that 
GOP has the structure shown in Fig. 2, then the pack- 
et data DATA11 is stored from the data of the first I 



picture of the GOP1, so that a value equivalent to the 
time earlier by three frames than PTS is described in 
DTS in the packet header of the DATA11. 

As mentioned earlier, the method described in 
the ISO 11172 has shortcomings such that a counter 
having many bits should be provided in the reproduc- 
ing apparatus, and that decoding timing should be so 
controlled as to start the presentation of decoded 
data as a video image or sound and voices when the 
value of the counter coincides with the presentation 
time stamp (PTS), thus complicating the control cir- 
cuit 

The present invention will now be described in 
detail referring to the accompanying drawings. 

First, the length of one pack is set equal to the 
time slot for one GOP (e.g., 15 frames) of video sig- 
nals, which are stored in a compressed form in video 
packets in one pack. 

This is accomplished by the following method. 

Fig. 5 is a diagram showing a data format in a 
method of recording compressed and coded data ac- 
cording to such a method. 

In Fig. 5, the amount of data of video signals after 
compression as shown in (b) in Fig. 2 differs frame by 
frame but should always be constant in one GOP. A 
scheme for making the amount of data in a single 
GOP constant will be discussed later. A portion (a) in 
Fig. 5 shows the amount of data for each frame in a 
GOP, with the vertical scale representing the amount 
of data and the horizontal scale representing frames 
31 to 14B. The data of the GOP is stored as a video 
packet VP in a pack together with an audio packet AP 
and a data packet DP as indicated in a portion (b) in 
Fig. 5. The size of one logical block (portion (c) in Fig. 
5) of a predetermined recording medium on which 
such packets are recorded is 2048 bytes, and one 
pack has a size of 2048 x 144 bytes (144 logical 
blocks). In one pack, the system header SH including 
the pack start code PS and system clock reference 
SCR, and a data packet DP occupy 2048 * 12 bytes, 
an audio packet AP occupies 2048 x 8 bytes and four 
video packets VP occupy 2048 x 124 bytes. 

The upper limit of the bit rate for audio and video 
signals after compression become as follows. 

audio: 2048 x 8 x 8/0.5005 = 261.88 

(Kbps) 

video: 2048 x 124 x 8/0.5005 = 4.059 (Mbps) 
The above bit rates are sufficient to transmit two 
channels of high-quality audio signals and video sig- 
nals having a high image quality. To provide four 
channels of audio signals, the size of the data packet 
DP should be changed to 2048 x 4 bytes and an audio 
packet of 2048 x 8 bytes should be added so that each 
pack contains two systems of audio signals. 

The number of physical blocks (portion (d) in Fig. 
5) of the predetermined recording medium varies de- 
pending on the error correcting system, particularly, 
the property of a burst error and the size of redundan- 
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cy allowed by the error correction code in the record- 
ing and reproducing system for the recording medium. 
For instance, when one physical block has a size of 
2 16 = 65536 bytes, one pack has four and half physical 
blocks, and when one physical block has a size of 2 15 5 
= 32768 bytes, one pack has nine physical blocks. 

As the audio packet AP contains compressed au- 
dio signals which should be reproduced at substan- 
tially the same time as the GOP, decoding the audio 
signals and reproducing them in synchronism with the 10 
video signals require a buffer memory which has a ca- 
pacity to store at least one packet of audio signals 
plus audio signals for the decoding delay of video sig- 
nals. Because the audio signals carry a small amount 
of data, however, the buffer memory can have a small 1 5 
capacity. 

The following exemplifies a method of making the 
amount of data in a video stream constant in one GOP 
as mentioned above. 

Fig. 6 presents a schematic block diagram of an 20 
encoder which accomplishes this method. 

In Fig. 6, the encoder comprises a frame order 
changing section 11, a motion detector 12, a differen- 
tiator 13, a discrete cosine transformer (DCT) 14, a 
quantizer 1 5, a variable length coder (VLC) 1 6, a mul- 25 
tiplexer 1 7, a buffer memory 1 8, an inverse quantizer 
19, an inverse DCT 20, an adder 21 and a frame ac- 
cumulating and predicting section 22. The predicting 
section 22 detects the moving vector, and determines 
the prediction mode. The inverse DCT 20, inverse 30 
quantizer 19 and adder 21 constitute a local decoder. 

The basic function of this encoder is to perform 
discrete cosine transform (DCT) of an input digital vid- 
eo signal by the DCT 14, quantize the transformed 
coefficient by the quantizer 1 5, encode the quantized 3S 
value by the VLC 16 and output the coded data as a 
video stream via the buffer memory 18. The DCT, 
quantization and coding are carried out in accordance 
with the detection of the moving vector, the discrim- 
ination of the prediction mode, etc., which are accom- 40 
plished by the local decoder, the predicting section 22 
and the motion detector 12. 

While the basic structure and function of this en- 
coder are described in the specifications of the afore- 
mentioned ISO 11172, the block which makes the 45 
amount of data in one GOP in the output video stream 
will be discussed in the following description. 

This block comprises a code amount calculator 
23, a quantization controller 24, a stuffing data gen- 
erator 25, and a timing controller 26. The code so 
amount calculator 23 attains the amount of stored 
data occupying the buffer memory 18 and calculates 
the amount of accumulated data of video signals, cod- 
ed at the input section of the buffer memory 18, 
(amount of codes) from the head of the GOP. The 55 
quantization controller 24 determines the quantizer 
scale for each predetermined unit obtained by divid- 
ing one frame by a predetermined size in accordance 



with the amount of the stored data and the amount of 
accumulated data, and controls the amount of coded 
data. The stuffing data generator 25 generates pre- 
determined stuffing data in accordance with the 
amount of accumulated data. The timing controller26 
generates timing signals necessary for the individual 
sections, such as a horizontal sync signal Hsync, a 
frame sync signal FRsync and a GOP sync signal 
GOPsync, based on the input digital video signal. The 
quantizer 1 5 quantizes the coefficient after DCT, div- 
ides this value by the quantizer scale obtained by the 
quantization controller 24, and then outputs the resul- 
tant value. The quantizer scale becomes an input to 
the multiplexer 17. The output of the stuffing data 
generator 25, which will be discussed later, is also one 
input to the multiplexer 17. 

The buffer memory 18 functions as illustrated in 
Fig. 7. A variable amount of coded data is generated 
and written in the buffer 18 at times 0, 1T, 2T and so 
forth (T: frame period). In this diagram, the arrows and 
their lengths respectively represent the writing direc- 
tions and the amount of data in the memory 18. The 
data Is read out from the buffer memory 1 8 at a con- 
stant rate. This is represented by the inclined, broken 
lines in the diagram. The writing and reading are re- 
peated in the illustrated manner. The code amount 
calculator 23 obtains the amount of data occupying 
the buffer memory 18, and the quantization controller 
24 alters the quantizer scale of the quantizer 15 
based on the amount of occupying data in such a way 
that the buffer memory 18 does not overflow or un- 
derflow, thus controlling the amount of data to be in- 
put to* the buffer memory 18. As the quantizer scale 
of the quantizer 1 5 increases, the amount of output 
data therefrom decreases. As the quantizer scale de- 
creases, on the other hand, the amount of output data 
from the quantizer 15 increases. The image quality is 
however reciprocal to the quantizer scale. This con- 
trol on the amount of codes is also described in the 
specifications of the ISO 11172 as a method of trans- 
ferring, at a constant rate, a variable amount of coded 
data generated frame by frame. 

As the amount of data in each GOP is constant 
in this embodiment, the following control is carried out 
in addition to the above-described control on the 
amount of data. 

The value of the quantizer scale may be deter- 
mined as follows. 

Under the condition to make the amount of data 
in one GOP constant, the quantization controller 24 
calculates the amount of accumulated data from the 
head block of the GOP to the immediately before that 
block (expected amount of accumulated data), based 
on the amount of data set previously block by block. 
The quantization controller 24 obtains the difference 
between this expected amount of accumulated data 
and the amount of data obtained by the code amount 
calculator 23 or the amount of accumulated data ac- 
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tuaily coded and generated from the head block of the 
GOP to the immediately before that block (actual 
amount of accumulated data), and determines the 
value of the quantizer scale so that the actual amount 
of accumulated data approaches, but does not ex- 
ceed, the expected amount of accumulated data as 
close as possible in accordance with the positive or 
negative sign of that difference and the absolute val- 
ue thereof. The top of each GOP is indicated by the 
GOP sync signal GOPsync from the timing controller 
26. 

The amount of data for each block may be set in 
the following manner. 

(1) The ratio of the amounts of data of I, P and B 
pictures for each frame is determined. 

For example, l:P:B = 15:5:1. 

(2) The amount of data of each frame determined 
by the ratio given in the above process (1) is 
evenly allocated to the individual blocks in one 
frame. 

When coding of all the frames of one GOP is fin- 
ished, the actual amount of accumulated data is equal 
to or smaller than the expected amount of accumulat- 
ed data. To completely match the expected amount of 
accumulated data with the amount of data in the video 
data stream in one GOP period, an insufficient 
amount is compensated by stuffing data (e.g., dum- 
my data consisting of all "0 W ) generated by the stuffing 
data generator 25. 

In the coding system which conforms to the ISO 
11172, a bit stream has a plurality of positions where 
a proper amount of stuffing bits having a predeter- 
mined bit pattern can be inserted, and the bit stream 
is defined so that the presence of stuff ing bits and the 
length thereof can be discriminated. For example, a 
stream of MB STUFF (macroblock stuffing) data of a 
macroblock layer or the like is used. Further, the 
quantizer scale is also defined to be inserted in the bit 
stream when it is transmitted. For example, a stream 
of QS (quantizer scale) of a slice layer is used. 

The decoder, which decodes a video data stream 
that includes the stuffing data and quantizer scale 
and has a constant amount of GOP data, detects va- 
rious headers inserted in the input bit stream (such as 
the sequence header, GOP start code, picture start 
code and slice start code), and is synchronized with 
this bit stream. The decoder performs decoding of 
each block in the bit stream by referring to the quan- 
tizer scale and performs no decoding on stuffing data 
when detected, i.e., the stuffing data is not decoded 
as video or audio signals or other information. In other 
words, the decoder disregards the stuffing data and 
can thus perform decoding without particularly exe- 
cuting the above-described data amount control to 
make the amount of data in each GOP constant 

The essential features of the present invention 
will now be described with reference to Figs. 8 
through 10. 



Fig. 8 is a diagram showing the format of a pack 
in a transmission method according to one embodi- 
ment of the present invention and the time-sequential 
relation between actual video information and audio 

5 information in that pack. 

As shown in (a) in Fig. 8, one GOP, i.e. 15 frames 
of video information, stored in a video packet VP of 
such a pack, can be considered as having a time slot 
of 0.5005 sec as one frame has a time slot of 1/29.97 

10 sec. Thus, the time slot of a pack is set equal to 
0.5005 sec, the time slot of the video packet VP. 

According to the system that conforms to the ISO 
1 11 72, 1 1 52 samples of two (R and L) channels of au- 
dio signals as shown in (b) and (c) in Fig. 8 are treated 

is as one AAU (Audio Access Unit), and are compressed 
to data having a fixed length for each AAU. Given that 
the sampling frequency of audio signals is 48 KHz, 
1 1 52 samples have a time slot of 24 msec which is the 
time slot of a single AAU as shown in (d) in this dia- 

20 gram. Therefore, the number of AAUs to be stored in 
one pack having a time slot of 0.5005 sec is equiva- 
lent to 20.854-. 

Since decoding of audio signals is performed for 
each AAU, the number of AAUs in one pack is "21" or 

25 "20" if that number is an integer. If twenty-one AAUs 
are to be stored in one pack, audio information lags 
by 3.5 msec from video information per one pack, and 
if twenty AAUs are to be stored in one pack, audio in- 
formation leads 20.5 msec from video information per 

30 one pack, as shown in (b), (c) and (d) in Fig. 8. Accord- 
ingly, forty-one packs each containing 21 AAUs and 
seven packs each containing 20 AAUs, mounting to 
forty-eight packs having just 1001 AAUs, coincide 
with the time slot of 48 packs 24.024 sec. The relative 

35 time difference between a video packet VP and an 
audio packet AP in one pack therefore differs from 
one pack to another and returns to the original one in 
a period of 48 packs. (Hereinafter, this period will be 
called "48-pack period.") 

40 To absorb this time difference, an AAU sequence 

number is inserted in a data packet DP in each pack 
as information indicating the location of that pack in 
a stream of 48 packs which form one period. 

Fig. 9 exemplifies the relation among this AAU 

45 sequence number, the number of AAUs in one pack, 
and the difference between presentation start times 
for video signals and audio signals in a reproducing 
apparatus. 

It is apparent from Fig. 9 that AAU sequence 
so numbers "0" to "47" are given to identify packs num- 
bered from "1 ■ to "48" in one 48-pack period, and AAU 
sequence numbers "0" to "47" are also given to iden- 
tify subsequent packs in the next 48-pack period 
starting from the one with a pack number w 49\ It is 
55 also apparent that a pack having 20 AAUs is formed 
in nearly a 7-pack period and every time this pack is 
input to the reproducing apparatus, the difference be- 
tween the presentation start times for video signals 
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and audio signals becomes smaller. At the last pack 
in the 48-pack period having an AAU sequence num- 
ber "47", this difference becomes zero. 

Adescription will now be given of the reproducing 
apparatus that reproduces video and audio signals 
and data, which are transmitted in such a stream of 
packs. 

Fig. 10 is a block diagram of this reproducing ap- 
paratus. 

In Fig. 10, a demultiplexer 31 separates an input 
stream of packs into a stream of video packets, a 
stream of audio packets and a stream of data pack- 
ets. The separated video packet VP is sent to a video 
decoder 32, the separated audio packet AP is sent to 
an FIFO (First-ln First-Out) memory 33, and the sepa- 
rated data packet DP is sent to a delay controller 34. 
The data packet DP is also output directly as a data 
output The video decoder 32 decodes compressed 
video signals and outputs the decoded video signals 
to a D/ A converter 35. The D/A converter 35 converts 
the decoded video signals to analog video signals and 
outputs them as a video signal output. The FIFO 
memory 33 functions as a variable delay circuit 
whose delay amount is controlled by the delay con- 
troller 34, and outputs delayed audio packet data to 
an audio decoder 36. The audio decoder 36 decodes 
the compressed audio signals and sends the resultant 
signals to a D/A converter 37. The D/A converter 37 
converts the decoded audio signals to analog audio 
signals and outputs them as an audio signal output. 

The delay controller 34 refers to the AAU se- 
quence number in the data packet DP to acquire the 
difference between presentation start times for the 
video output and the audio output, and controls the 
delay amount of the FIFO memory 33 so that this de- 
lay amount matches with the acquired time differ- 
ence. Accordingly, the synchronous reproduction of 
the video signals and audio signals is achieved. 

As described above, the difference between the 
presentation start times for the video signals and au- 
dio signals included in a pack cyclically varies in a per- 
iod of a certain number of packs, each pack includes 
information on the number of that pack in a plurality 
of packs that constitutes one period, the difference 
between the presentation start times for the video sig- 
nals and audio signals in each pack is acquired by re- 
ferring to this information on the pack number, and the 
decoding timing is controlled based on this time dif- 
ference. The synchronous reproduction of the video 
signals and audio signals can therefore be accom- 
plished easily. 

As synchronous reproduction requires just 48 de- 
lay patterns for the FIFO memory 33, the circuit struc- 
ture becomes simple. Another advantage is that the 
buffer (FIFO memory 33 in this case) for controlling 
the timing of audio signals can have a small capacity. 
To match the time axis of video signals with the time 
axis of audio signals, it is possible to control the time 



axis of video signals alone or control both time axes 
of video and audio signals. 

The FIFO memory 33 as a variable delay circuit 
may be positioned at the subsequent stage of the au- 
5 dio decoder 36 or the buffer memory in the audio de- 
coder 36 may also be used as a variable delay circuit. 

Although one pack contains single data, audio 
and video packets in the above-described embodi- 
ment, one pack may contain a plurality of packets for 
w each type. Although two channels of audio signals are 
processed, four channels of audio signals may be 
dealt with. In the case of four channels, audio signals 
should be compressed, AAU by AAU, by the same 
time, two channels at a time, before the audio signals 
15 are stored in a packet. The same applies to the case 
of multichannel of audio signals. Although the forego- 
ing description of the embodiment has been given 
with reference to the case where video signals are 
compressed before transmission, the present inven- 
20 tion is also applicable to the case where video signals 
are transmitted uncompressed. 

Further, although the embodiment has been ex- 
plained as the system which also has a function of re- 
cording a stream of packs on a recording medium, the 
25 advantages of the present invention can be expected 
as long as the system, even without the recording 
function, has a function of transmitting time-divided 
video and audio signals. 

In short, according to the method of transmitting 
30 time-divided video and audio signals, the number of 
unit audio data blocks to be put in one pack is set in 
such a way that the difference between the presen- 
tation^start times for the stream of video data and the 
stream of audio data in one pack in a predetermined 
35 pack period becomes a predetermined value, and the 
pack carries positional information of the pack in the 
predetermined pack period to the pack. In addition, 
according to the method of reproducing time-divided 
video and audio signals, the difference between pre- 
40 sentation start times for video signals and audio sig- 
nals in each pack is acquired by referring to positional 
information in a stream of packs, transferred by the 
above transmission method, and at least one of the 
presentation start times for video signals and audio 
45 signals in the stream of packs is controlled so that the 
difference between the presentation start times coin- 
cides with the difference between the presentation 
start times corresponding to the positional informa- 
tion. 

so With the above design, it is possible to provide a 

synchronizing system with a simple structure,^ which 
can accomplish synchronous reproduction without 
complicating the control circuit for synchronizing vid- 
eo and audio signals with each other. 

55 
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Claims 



A method of transmitting time-divided video and 
audio signals, comprising the steps of: 

coding a predetermined time slot of video 
signals to form a stream of video data; 

coding a predetermined number of sam- 
ples of audio signals to form a unit audio data 
block and forming a stream of audio data consist- 
ing of unit audio data blocks whose quantity ap- 
proximately corresponds to said predetermined 
time slot; 

performing time-division multiplexing on 
said stream of video data and said stream of au- 6. 
dio data, storing resultant data in a pack having 
said predetermined time slot, and transferring 
video signals and audio signals in a stream of 
packs; and 

setting said quantity in such a way that a : 
difference between presentation start times for 20 
said stream of video data and said stream of au- 
,dio data in one pack in a predetermined pack per- 8. 
iod becomes a predetermined value, and affixing 
positional information of said pack in said prede- . 
termined pack period to said pack. 25 



10 



15 



multiplexing on said stream of video data and 
said stream of audio data, storing resultant data 
in a pack having said predetermined time slot, 
and transferring video signals and audio signals 
in a stream of packs, and setting said quantity in 
such a way that a difference between presenta- 
tion start times for said stream of video data and 
said stream of audio data in one pack in a prede- 
termined pack period becomes a predetermined 
value, and affixing said positional information of 
said pack in said predetermined pack period to 
said pack. 

The method according to Claim 5, wherein said 
predetermined pack period is a period for 48 
packs. 



7. The method according to Claim 5, wherein said 
positional information indicates a number of said 
pack in said predetermined pack period. 



The method according to Claim 6, wherein said 
positional information indicates a number of said 
pack in said predetermined pack period. 



2. The method according to Claim 1, wherein said 
predetermined pack period is a period for 48 
packs. 

3. The method according to Claim 1, wherein said 
positional information indicates a number of said 
pack in said predetermined pack period. 



30 



The method according to Claim 2, wherein said 
positional information indicates a number of said 
pack in said predetermined pack period. 



35 



5. A method of reproducing time-divided video and 

audio signals, comprising a step of referring to 40 
positional information from a stream of packs, to 
control at least one of presentation start times for 
video signals and audio signals in said stream of 
packs in such a manner that a difference be- 
tween said presentation start times for said video 45 
signals and audio signals coincides with a differ- 
ence between presentation start times corre- 
sponding to said positional information, 

said stream of packs being transferred by 
a method of transferring time-divided video and so 
audio signals having the steps of coding a prede- 
termined time slot of video signals to form a 
stream of video data, coding a predetermined 
number of samples of audio signals to form aunit 
audio data block and forming a stream of audio 55 
data consisting of unit audio data blocks whose 
quantity approximately corresponds to said pre- 
determined time slot, performing time-division 
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